heterogeneous sources

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

We say that data comes from heterogeneous sources if it orginates in different organisations, or from different kinds of data sources. the daat itself may be heterigeneous, with mixes of numeric, symbolic data, but even the same data value may be represented differently, for example different date formats, or in some places names given as a whole (e.g. "Alan Dix") and in others coded in parts (e.g. {given:"Alan",family:"Dix"}. This creates many challenges. One needs to either transform all the data sources into a single standard format, often involving subsatntial data wrangling, or create some form of mapoing between the data formats. Connecting the different data sources is also a challenge, for example, the different name formats may make the same name appear differently, but also the same name in two data sets may refer to different people or things.

Used on Chap. 10: page 202